-
Notifications
You must be signed in to change notification settings - Fork 855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gc time #311
Gc time #311
Conversation
Does this PR replace the other JVM GC related heuristic PR (279)? |
/** The ascending severity thresholds for the ratio of JVM GC Time and executor Run Time (checking whether ratio is above normal) | ||
* These thresholds are experimental and are likely to change */ | ||
val DEFAULT_GC_SEVERITY_A_THRESHOLDS = | ||
SeverityThresholds(low = 0.08D, moderate = 0.1D, severe = 0.15D, critical = 0.2D, ascending = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make these threshold values configurable just like the following:
|
||
/** The descending severity thresholds for the ratio of JVM GC Time and executor Run Time (checking whether ratio is below normal) | ||
* These thresholds are experimental and are likely to change */ | ||
val DEFAULT_GC_SEVERITY_D_THRESHOLDS = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
|
||
/** | ||
* A heuristic based on GC time and CPU run time | ||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra line.
app-conf/HeuristicConf.xml
Outdated
<heuristic> | ||
<applicationtype>spark</applicationtype> | ||
<heuristicname>Spark GC Time to CPU Time</heuristicname> | ||
<classname>com.linkedin.drelephant.spark.heuristics.GcCpuTimeHeuristic</classname> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change the references to CPUTime to runtime, since we are using executor run/duration time now.
@@ -0,0 +1,138 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file name seems to be ExecutorStorageSpillHeuristicTest.scala -- should this be renamed to ExecutorGcRuntimeHeuristicTest.scala?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you push the changes you have made? The PR doesn't reflect this change.
@@ -0,0 +1,6 @@ | |||
<p>The ratio of jvmGcTime to executorCpuTime is checked, to see if GC is taking too much time (providing more memory could help) or too little time (memory may be over provisioned, and can be reduced).</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this mention the descending thresholds as well?
@shkhrgpt, yes it does. |
@@ -0,0 +1,12 @@ | |||
<p>The ratio of jvmGcTime to executorRunTime is checked, to see if GC is taking too much time (providing more memory could help) or too little time (memory may be over provisioned, and can be reduced).</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rephrase this. Help page should be simple and more intuitive for the users.
GC Time Heuristic flags jobs which are spending too much time on Garbage Collection.
If GC is taking too much time, then providing more memory could help. On the other hand, if GC is taking too little time, it could mean that the memory is over provisioned and it can be reduced.
Remove all the threshold values from the help page. I don't think it will help the users in ay way.
@@ -0,0 +1,138 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you push the changes you have made? The PR doesn't reflect this change.
app-conf/HeuristicConf.xml
Outdated
@@ -193,5 +193,11 @@ | |||
<classname>com.linkedin.drelephant.spark.heuristics.StagesHeuristic</classname> | |||
<viewname>views.html.help.spark.helpStagesHeuristic</viewname> | |||
</heuristic> | |||
<heuristic> | |||
<applicationtype>spark</applicationtype> | |||
<heuristicname>Spark GC Time to Run Time</heuristicname> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would prefer naming this Heuristic as simply Executor GC
.
Also reflect this change elsewhere in the Heuristic class and Help page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
private def getTimeValues(executorSummaries: Seq[ExecutorSummary]): (Long, Long) = { | ||
var jvmGcTimeTotal: Long = 0 | ||
var executorRunTimeTotal: Long = 0 | ||
executorSummaries.foreach(executorSummary => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure executorSummaries doesn't include driver info. If so, exclude the driver from the executors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
import scala.collection.JavaConverters | ||
|
||
/** | ||
* A heuristic based on GC time and CPU run time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rephrase to something more intuitive. It should tell what this class/heuristic does. Refer the comment I added on the help page.
val evaluator = new Evaluator(this, data) | ||
var resultDetails = Seq( | ||
new HeuristicResultDetails("GC time to Executor Run time ratio", evaluator.ratio.toString), | ||
new HeuristicResultDetails("GC total time", evaluator.jvmTime.toString), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gc total time
to Total GC Time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
var resultDetails = Seq( | ||
new HeuristicResultDetails("GC time to Executor Run time ratio", evaluator.ratio.toString), | ||
new HeuristicResultDetails("GC total time", evaluator.jvmTime.toString), | ||
new HeuristicResultDetails("Executor Run time", evaluator.executorRunTimeTotal.toString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Executor Run time
to Total Executor Runtimes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
//adding recommendations to the result, severityTimeA corresponds to the ascending severity calculation | ||
if (evaluator.severityTimeA.getValue > Severity.LOW.getValue) { | ||
resultDetails = resultDetails :+ new HeuristicResultDetails("Note", "The ratio of JVM GC Time and executor Time is above normal, we recommend to increase the executor memory") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rephrase: The job is spending too much time on GC. We recommend increasing the executor memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
//severityTimeD corresponds to the descending severity calculation | ||
if (evaluator.severityTimeD.getValue > Severity.LOW.getValue) { | ||
resultDetails = resultDetails :+ new HeuristicResultDetails("Note", "The ratio of JVM GC Time and executor Time is below normal, we recommend to decrease the executor memory") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @skakker
The ratio of jvmGcTime to executorCpuTime is checked, to see if GC is taking too much time (providing more memory could help) or too little time (memory may be over provisioned, and can be reduced).
The severity thresholds are as follows :
Low: avg (jvmGcTime / executorCpuTime) >= .08
Moderate: avg (jvmGcTime / executorCpuTime) >= .1
Critical: avg (jvmGcTime / executorCpuTime) >= .15
Severe:avg (jvmGcTime / executorCpuTime) >= .2